[dylink] Fix deadlock in `_emscripten_dlsync_threads()` by kleisauke · Pull Request #27000 · emscripten-core/emscripten

kleisauke · 2026-05-23T08:45:06Z

This extends commit 662cb06 by ensuring _emscripten_thread_notify()
is also called for the synchronous _emscripten_dlsync_threads()
path. This ensures that threads process the dlopen catch-up queue
promptly, even when blocked in emscripten_futex_wait().

Fixes: #26913.

Split out from emscripten-core#27000.

sbc100 · 2026-05-23T17:00:41Z

-    DBG("waking main runtime thread using _emscripten_thread_notify");
+  if (ret) {
+    // Wake up the target thread in case it is blocked in
+    // `emscripten_futex_wait`.


This is little confusing (and maybe wasteful) though since normal pthread do not run their proxy queue in emscripten_yield, and therefore wakeing them them up should not be needed.

pthreads do however call _emscripten_process_dlopen_queue from emscripten_yield .. so it should only be necessary to wake them up if/when need need to call _emscripten_process_dlopen_queue, and not whereever work gets proxied to them.

I'm not sure if _emscripten_thread_notify() is actually that expensive, since it already returns early when the thread isn't waiting.

I tried limiting this notify to the dlopen() proxying queue with commit 72cb72a, but reintroducing && pthread_equal(target_thread, emscripten_main_runtime_thread_id()) regressed this again.

pthreads do however call _emscripten_process_dlopen_queue from emscripten_yield [...]

I think that's probably why reintroducing && pthread_equal(target_thread, emscripten_main_runtime_thread_id()) caused this regression again, as the queue is processed from pthreads rather than from the main runtime thread.

I'm not really worried about the code of doing the _emscripten_thread_notify. But I would really like to know why it is needed. Because pthreads don't run their messge queue on wakeup I don't see any reason to wake them when we put something in the queue.

pthreads should only need to wakeup if they are need to process the dlopen queue, right?

pthreads should only need to wakeup if they are need to process the dlopen queue, right?

Correct, the latest revision of this PR already restricts wakeups to the dlopen queue.

…7001) Split out from #27000.

This extends commit 662cb06 by ensuring `_emscripten_thread_notify()` is also called for the synchronous `_emscripten_dlsync_threads()` path. This ensures that threads process the dlopen catch-up queue promptly, even when blocked in `emscripten_futex_wait()`. To implement this cleanly, `dlopen_proxying_queue` is moved from a dynamic pointer in `dynlink.c` to a statically initialized queue in `proxying.c`. Fixes: emscripten-core#26913.

This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (2) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads.json: 26180 => 26179 [-1 bytes / -0.00%] codesize/test_codesize_minimal_pthreads_memgrowth.json: 26589 => 26588 [-1 bytes / -0.00%] Average change: -0.00% (-0.00% - -0.00%) ```

This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (2) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads.json: 26158 => 26157 [-1 bytes / -0.00%] codesize/test_codesize_minimal_pthreads_memgrowth.json: 26567 => 26566 [-1 bytes / -0.00%] Average change: -0.00% (-0.00% - -0.00%) ```

sbc100 · 2026-05-27T21:44:17Z

I'd really like to get this fixed for the 6.0.0 release, but I'd also love to understand the issue better rather then just always waking pthreads when they we add a message to their queue (which still seem like it should not be needed in theory).

This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (2) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads.json: 26147 => 26146 [-1 bytes / -0.00%] codesize/test_codesize_minimal_pthreads_memgrowth.json: 26556 => 26555 [-1 bytes / -0.00%] Average change: -0.00% (-0.00% - -0.00%) ```

kleisauke · 2026-05-28T10:39:49Z

[...] just always waking pthreads when they we add a message to their queue (which still seem like it should not be needed in theory).

The latest revision of this PR already limits the _emscripten_thread_notify() call to the dlopen queue (and the PR description has been updated to reflect this change).

sbc100 · 2026-05-28T14:48:34Z

    self.set_setting('EXIT_RUNTIME')
-    self.set_setting('PROXY_TO_PTHREAD')
+    # Uncomment to test _emscripten_proxy_dlsync_async()
+    # self.set_setting('PROXY_TO_PTHREAD')


Can we run this in both modes using @parameterized ?

Good idea, done via 4ef62da.

sbc100

Nice! Thanks for all the work on this.

This solution looks correct to me. I'm a little sad that the core proxying mechanism needs to be aware of the dynlink queue, but I'm not sure I see any way around it.

sbc100 · 2026-05-28T14:52:18Z

-// `_emscripten_proxy_dlsync` below, and processed by background threads
-// that call `_emscripten_process_dlopen_queue` during futex_wait (i.e. whenever
-// they block).
-static em_proxying_queue * _Atomic dlopen_proxying_queue = NULL;


Could we leave the declaration of this queue here in this file? (and declare a getter for it?)

In fact, could we just declare the queue itself as extern in proxying.c‎ (avoiding the getting completely).

Revised this to use an extern declaration with commit 4ef62da.

Hmm, this seems to regress code size, see e.g. commit 4d7842b.

You mean moving back to this file causes a regression? I think co-locating it here does make the most sense unless its a big regression, which would seem very odd.

Never mind, the code size expectations for the main branch need to be rebaselined.
https://github.com/emscripten-core/emscripten/actions/runs/26584327538/job/78325943846?pr=27000

sbc100

lgtm % comments

sbc100 · 2026-05-28T14:54:36Z

+
+  // Proxying via the dlopen or system queue may target a thread that is
+  // currently blocked in `emscripten_futex_wait`, so explicitly wake it
+  // after enqueueing the task.


Could you leave the original sentence in place and then add a second one: "In addition, we have a special case here for the dlopen_proxying_queue...". Maybe with a TODO do find way to avoid the need for this special case?

Revised this comment with 4ef62da.

This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (2) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads.json: 26146 => 26179 [+33 bytes / +0.13%] codesize/test_codesize_minimal_pthreads_memgrowth.json: 26555 => 26588 [+33 bytes / +0.12%] Average change: +0.13% (+0.12% - +0.13%) ```

sbc100

LGTM, but can you move the declaration of the queue back to ‎system/lib/libc/dynlink.c?

kleisauke · 2026-05-28T15:44:40Z

but can you move the declaration of the queue back to ‎system/lib/libc/dynlink.c?

Unfortunately, this won't work because em_proxying_queue is only forward-declared and therefore an incomplete type.

../../../system/lib/libc/dynlink.c:362:19: error: variable has incomplete type 'em_proxying_queue' (aka 'struct em_proxying_queue')
  362 | em_proxying_queue _dlopen_proxying_queue = {
      |                   ^
/home/kleisauke/emscripten/cache/sysroot/include/emscripten/proxying.h:29:16: note: forward declaration of 'struct em_proxying_queue'
   29 | typedef struct em_proxying_queue em_proxying_queue;
      |                ^
1 error generated.

sbc100 · 2026-05-28T16:24:43Z

but can you move the declaration of the queue back to ‎system/lib/libc/dynlink.c?

Unfortunately, this won't work because em_proxying_queue is only forward-declared and therefore an incomplete type.
../../../system/lib/libc/dynlink.c:362:19: error: variable has incomplete type 'em_proxying_queue' (aka 'struct em_proxying_queue')
  362 | em_proxying_queue _dlopen_proxying_queue = {
      |                   ^
/home/kleisauke/emscripten/cache/sysroot/include/emscripten/proxying.h:29:16: note: forward declaration of 'struct em_proxying_queue'
   29 | typedef struct em_proxying_queue em_proxying_queue;
      |                ^
1 error generated.

Hmm. thats annoying. I guess we would need some kind of way to statically initialize a proxy queue.

Its not great that the dynlink system leaks into the core prozying system like this though, so I'm not sure the advantage of the static initializer outweigh the disadvantage of the separation of concerns here.

I'm hoping to remove all reference to dlopen queue from the core proxying.c (as a followup perhaps) so maybe just keeping this as lazily initialized in dynlink.c is more condusive to that future?

sbc100 · 2026-05-28T16:25:13Z

Maybe you can use pthread_once to initialize it in its getter function?

sbc100 · 2026-05-28T16:25:53Z

Or maybe C11 call_once?

kleisauke · 2026-05-28T16:36:18Z

Actually, we can trivially move it back to dylink.c. See commit e178d8e. I've also updated the PR description accordingly.

Since accessing a shared variable from multiple threads without synchronization (where at least one access is a write) is always a data race and causes UB.

sbc100 · 2026-05-28T18:26:14Z

Sorry the codesize tests just got rebased. Can you rebase one more time?

sbc100 · 2026-05-28T18:54:57Z

Just to confirm, with the changes to test_pthread_dlopen, it will now fail without this fix? Is it just one variant of it in particular requires this fix?

kleisauke · 2026-05-28T19:32:04Z

Just to confirm, with the changes to test_pthread_dlopen, it will now fail without this fix?

I can confirm that browser.test_pthread_dlopen (as well as core2.test_pthread_dlopen) will timeout/fail without this fix, whereas core2.test_pthread_dlopen_proxied is unaffected.

Is it just one variant of it in particular requires this fix?

Yup, this fix was only required for the synchronous path in _emscripten_dlsync_threads(). I guess there was previously missing test coverage for this (which is also why PR #27001 was needed).

kleisauke mentioned this pull request May 23, 2026

Possible deadlock in _emscripten_dlsync_threads() since #26659 #26913

Closed

kleisauke added a commit to kleisauke/emscripten that referenced this pull request May 23, 2026

[wasm64] Add missing sig mapping for _emscripten_proxy_dlsync()

373e07d

Split out from emscripten-core#27000.

kleisauke mentioned this pull request May 23, 2026

[wasm64] Add missing sig mapping for _emscripten_proxy_dlsync() #27001

Merged

sbc100 reviewed May 23, 2026

View reviewed changes

sbc100 pushed a commit that referenced this pull request May 23, 2026

[wasm64] Add missing sig mapping for _emscripten_proxy_dlsync() (#2…

6b67d4c

…7001) Split out from #27000.

kleisauke added 3 commits May 23, 2026 22:39

Add reproducer

227c45c

kleisauke force-pushed the issue-26913-fix branch from 72cb72a to d7be9f4 Compare May 23, 2026 20:42

kleisauke changed the title ~~[dylink] Fix deadlock in synchronous _emscripten_dlsync_threads() path~~ [dylink] Fix deadlock in _emscripten_dlsync_threads() May 23, 2026

kleisauke added 7 commits May 26, 2026 10:56

Improve comment

178727a

Improve test

f7e8ed7

Inline bool

2a44056

Appease ruff

6c04ce3

Merge branch 'main' into issue-26913-fix

e8d8f99

Merge branch 'main' into issue-26913-fix

7ef1966

kleisauke added 2 commits May 28, 2026 12:31

Merge branch 'main' into issue-26913-fix

146131d

sbc100 reviewed May 28, 2026

View reviewed changes

sbc100 approved these changes May 28, 2026

View reviewed changes

sbc100 reviewed May 28, 2026

View reviewed changes

kleisauke added 2 commits May 28, 2026 17:22

Incorporate feedback

4ef62da

sbc100 approved these changes May 28, 2026

View reviewed changes

Style

f7a6244

Move _dlopen_proxying_queue back to dynlink.c

e178d8e

kleisauke added 2 commits May 28, 2026 18:45

Re-add _Atomic

8ff1b95

Since accessing a shared variable from multiple threads without synchronization (where at least one access is a write) is always a data race and causes UB.

Reduce diff

1335f8b

sbc100 approved these changes May 28, 2026

View reviewed changes

Merge branch 'main' into issue-26913-fix

8ff5af8

sbc100 enabled auto-merge (squash) May 28, 2026 19:11

sbc100 disabled auto-merge May 28, 2026 19:36

sbc100 merged commit 0d68ae8 into emscripten-core:main May 28, 2026
37 of 39 checks passed

kleisauke deleted the issue-26913-fix branch May 28, 2026 19:40

Conversation

kleisauke commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kleisauke May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kleisauke May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbc100 commented May 27, 2026

Uh oh!

kleisauke commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbc100 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbc100 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbc100 left a comment

Choose a reason for hiding this comment

Uh oh!

kleisauke commented May 28, 2026

Uh oh!

sbc100 commented May 28, 2026

Uh oh!

sbc100 commented May 28, 2026

Uh oh!

sbc100 commented May 28, 2026

Uh oh!

kleisauke commented May 28, 2026

Uh oh!

sbc100 commented May 28, 2026

Uh oh!

sbc100 commented May 28, 2026

Uh oh!

kleisauke commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kleisauke commented May 23, 2026 •

edited

Loading

kleisauke May 23, 2026 •

edited

Loading

kleisauke May 28, 2026 •

edited

Loading

kleisauke commented May 28, 2026 •

edited

Loading